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DETAILED ACTION 

Reopen Prosecution 

1 . In view of the Appeal Brief filed on 4/28/2008, PROSECUTION IS HEREBY 
REOPENED. A new ground of rejection is set forth below. 

If an appellant wishes to reinstate an appeal after prosecution is reopened, 
appellant must file a new notice of appeal in compliance with 37 CFR 41 .31 and a 
complete new appeal brief in compliance with 37 CFR 41 .37. Any previously paid 
appeal fees set forth in 37 CFR 41 .20 for filing a notice of appeal, filing an appeal brief, 
and requesting an oral hearing (if applicable) will be applied to the new appeal on the 
same application as long as a final Board decision has not been made on the prior 
appeal. If, however, the appeal fees have increased since they were previously paid, 
then appellant must pay the difference between the current fee(s) and the amount 
previously paid. Appellant must file a complete new appeal brief in compliance with the 
format and content requirements of 37 CFR 41 .37(c) within two months from the date of 
filing the new notice of appeal. See MPEP § 1205. 



Remarks 

2. Claims 12-20, 37-40 and 42-58 are pending. 
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Claim Rejections - 35 USC § 103 

3. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

This application currently names joint inventors. In considering patentability of 
the claims under 35 U.S.C. 103(a), the examiner presumes that the subject matter of 
the various claims was commonly owned at the time any inventions covered therein 
were made absent any evidence to the contrary. Applicant is advised of the obligation 
under 37 CFR 1 .56 to point out the inventor and invention dates of each claim that was 
not commonly owned at the time a later invention was made in order for the examiner to 
consider the applicability of 35 U.S.C. 103(c) and potential 35 U.S.C. 102(e), (f) or (g) 
prior art under 35 U.S.C. 1 03(a). 

4. Claims 12-17,40,42-48 and 50-55 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Meverzon et al. (' Meverzon ' hereinafter) (Patent Number 6,547,829) 
in view of Cho et al. C Cho ' hereinafter) ("Finding replicated web collections," by Cho et 
al., Proceedings of the ACM SIGMOD International Conference on Management of 
Data, pages 355-366, 2000) and further in view of Wang et al. (' Wang ' hereinafter) 
("Web search services", Wang et al., University of Science and Technology, Hong 
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Computer Science Technical Report, 



As per claim 1 2, Meverzon teaches 

A method of detecting duplicate documents in a network crawling system, 
comprising: (see abstract and background) 

constructing a plurality of tables, each table corresponding to a portion of a 
document address space (builds new index based on documents, column 4, lines 43- 
60), storing information identifying documents having a same document identifier and 
each identified document having an associated document rank; (column 2, lines 3-16) 

receiving a newly crawled document, such document characterized by a 
document identifier and a document rank; (column 2, lines 3-16) 

reading information stored in the plurality of tables to identify a set of documents, 
sharing the document identifier of the newly crawled document, and ascertaining an 
original representative document for the identified set of documents; (column 9, lines 
18-29) 

updating the information stored in at least one of the tables in accordance with 
the document ranks of the identified set of documents and the newly crawled document; 
(column 2, lines 3-16) 

determining a representative document for the newly crawled document and the 
identified set of documents, (column 9, lines 32-40) 
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Meverzon does not explicitly indicate "indexing the representative document 
when the representative document is the newly crawled document; and repeating the 
receiving, reading, updating, determining and indexing operations with respect to a 
plurality of newly crawled documents, each of which shares a respective document 
identifier with a respective set of documents". 

However, Cho discloses "indexing the representative document when the 
representative document is the newly crawled document; and repeating the receiving, 
reading, updating, determining and indexing operations with respect to a plurality of 
newly crawled documents, each of which shares a respective document identifier with a 
respective set of documents, such that at least some of the newly crawled documents 
are determined to be representative documents and are indexed" (newly replicated 
collection, page 365, first column, second paragraph; one page displayed or represents 
collection of duplicate document, page 365, second column, first paragraph). 

It would have been obvious to one of ordinary skill in the art at the time the 
invention was made to combine Meverzon and Cho because using the steps of 
"indexing the representative document when the representative document is the newly 
crawled document; and repeating the receiving, reading, updating, determining and 
indexing operations with respect to a plurality of newly crawled documents, each of 
which shares a respective document identifier with a respective set of documents, such 
that at least some of the newly crawled documents are determined to be representative 
documents and are indexed" would have given those skilled in the art the tools to 
improve the invention by allowing duplicate documents to be identified and represented. 
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This gives the user the advantage of not having multiple copies of the same document 
to choose from. 

Neither Meyerzon nor Cho explicitly indicate "such that at least some of the 
newly crawled documents are determined to be representative documents and are 
indexed". 

However, Wang discloses "such that at least some of the newly crawled 
documents are determined to be representative documents and are indexed" (update 
the index where pages have changed, page 9, first and second paragraphs). 

It would have been obvious to one of ordinary skill in the art at the time the 
invention was made to combine Meyerzon , Cho and Wang because using the steps of 
"such that at least some of the newly crawled documents are determined to be 
representative documents and are indexed" would have given those skilled in the art the 
tools to improve the invention by allowing changes in the web to be accurately 
implemented in search engines. This gives the user the advantage of having 
representative research results. 

As per claim 1 3, Meyerzon teaches 

information identifying the identified set of documents, including a particular 
document serving as the original representative document of the identified set, is stored 
in one or more tables, (column 9, lines 32-40) 



As per claim 14, Meyerzon teaches 
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the determining includes comparing the document rank of the newly crawled 
document with that of the particular document from the identified set in accordance with 
a set of predefined comparison criteria; selecting the newly crawled document as the 
representative document if the set of predefined comparison criteria are met; (column 5, 
lines 20-40) 

and keeping the particular document as the representative document if the set of 
predefined comparison criteria is not met. (column 2, lines 32-40) 

As per claim 1 5, Meyerzon teaches 

the set of predefined comparison criteria comprise at least two parameters, one 
parameter for comparison with an absolute difference of document ranks between the 
newly crawled document and the particular document, and another parameter for 
comparison with a ratio of document ranks between the newly crawled document and 
the particular document, (column 5, lines 20-40) 

As per claim 1 6, Meverzon teaches 

the updating includes inserting information identifying the newly crawled 
document into the at least one table only when a predefined insertion condition is 
satisfied, (column 9, lines 32-40) 



As per claim 17, Meverzon teaches 
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the predefined insertion condition is that the document rank of the newly crawled 
document is higher than the document rank of at least one document in the identified 
set of documents, (column 2, lines 32-40) 

As per claim 40, Meyerzon teaches 

A computer program product for use in conjunction with a computer system, the 
computer program product comprising a computer readable storage medium and a 
computer program mechanism embedded therein, the computer program mechanism 
comprising: (see abstract and background) 

instructions for constructing a plurality of data structures for storing information of 
documents (builds new index based on documents, column 4, lines 43-60), each 
document characterized by a document identifier and a document rank, the information 
stored in the plurality of data structures include the document identifier and a document 
rank for each document; (URL in history table and CID in separate CID table, column 2, 
lines 64 through column 3, line 22) 

instructions for receiving a requesting document in association with its document 
identifier and document rank; (column 2, lines 3-16) 

instructions for selecting from the plurality of data structures a set of documents 
sharing the same document identifier as the requesting document, and ascertaining an 
original representative document for the identified set of documents; (column 9, lines 
18-40) 
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instructions for generating a new set of documents from the requesting document 
and the selected set of documents in accordance with their document rank; (column 2, 
lines 3-16) 

instructions for identifying a representative document of the new set of 
documents, (column 9, lines 32-40) 

Meverzon does not explicitly indicate "instructions for indexing the representative 
document when said representative document is the newly crawled document; and 
instructions for repeating the receiving, reading, updating, determining and indexing 
operations with respect to a plurality of newly crawled documents, each of which shares 
a respective document identifier with a respective set of documents". 

However, Cho discloses "instructions for indexing the representative document 
when said representative document is the newly crawled document; and instructions for 
repeating the receiving, reading, updating, determining and indexing operations with 
respect to a plurality of newly crawled documents, each of which shares a respective 
document identifier with a respective set of documents, such that at least some of the 
newly crawled documents are determined to be representative documents and are 
indexed" (newly replicated collection, page 365, first column, second paragraph; one 
page displayed or represents collection of duplicate document, page 365, second 
column, first paragraph). 

It would have been obvious to one of ordinary skill in the art at the time the 
invention was made to combine Meverzon and Cho because using the steps of 
"instructions for indexing the representative document when said representative 
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document is the newly crawled document; and instructions for repeating the receiving, 
reading, updating, determining and indexing operations with respect to a plurality of 
newly crawled documents, each of which shares a respective document identifier with a 
respective set of documents, such that at least some of the newly crawled documents 
are determined to be representative documents and are indexed" would have given 
those skilled in the art the tools to improve the invention by allowing duplicate 
documents to be identified and represented. This gives the user the advantage of not 
having multiple copies of the same document to choose from. 

Neither Meyerzon nor Cho explicitly indicate "such that at least some of the 
newly crawled documents are determined to be representative documents and are 
indexed". 

However, Wang discloses "such that at least some of the newly crawled 
documents are determined to be representative documents and are indexed" (update 
the index where pages have changed, page 9, first and second paragraphs). 

It would have been obvious to one of ordinary skill in the art at the time the 
invention was made to combine Meverzon . Cho and Wang because using the steps of 
"such that at least some of the newly crawled documents are determined to be 
representative documents and are indexed" would have given those skilled in the art the 
tools to improve the invention by allowing changes in the web to be accurately 
implemented in search engines. This gives the user the advantage of having 
representative research results. 
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As per claim 42, Meyerzon teaches 

the plurality of data structures include a data structure for storing information of 
multiple sets of documents, each set of documents sharing a same document content, 
(column 2, line 64 through column 3, line 22) 

As per claim 43, Meyerzon teaches 

the plurality of data structures include a data structure for storing information of 
multiple sets of documents, each set of documents sharing a same document address, 
(storage location, column 2, line 64 through column 3, line 22) 

As per claim 44, Meyerzon teaches 

the document identifier is a fixed length fingerprint of document content of a 
document characterized by the document identifier, (content identifier, column 2, line 64 
through column 3, line 22) 

As per claims 45, Meyerzon teaches 

the document identifier is a fixed length fingerprint of an address of a document 
characterized by the document identifier, (content identifier, column 2, line 64 through 
column 3, line 22) 



As per claims 46, Meyerzon teaches 
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the generating instructions include sorting the requesting document and the 
selected set of documents in accordance with a metric included in score information of 
the requesting document and selected set of documents; and selecting a new set of 
documents, having at most a predefined number of documents, from the requesting 
document and the selected set of documents based on the sorting result, (column 2, 
lines 3-16) 

As per claims 47, Meverzon teaches 

the score information for each document includes a document rank; (column 2, 
lines 3-16) 

and the identifying instructions include comparing the document rank of the 
requesting document with that of a particular document from the selected set of 
documents in accordance with a set of predefined comparison criteria, wherein the 
particular document was previously determined to be the representative document for 
the selected set of documents; (column 5, lines 20-40) 

selecting the requesting document as the representative document for the new 
set of documents if the set of predefined comparison criteria are met; (column 2, lines 
32-40) 

and keeping the particular document as the representative document for the new 
set of documents if the set of predefined comparison criteria is not met. (column 2, lines 
32-40) 
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As per claims 48, Meyerzon teaches 

the set of predefined comparison criteria comprise at least two parameters, one 
parameter for comparison with an absolute difference of document rank between the 
requesting document and the particular document, and another parameter for 
comparison with a ratio of document rank between the requesting document and the 
particular document (column 8, lines 39-61). 

As per claims 50-55, 

These claims are rejected on grounds corresponding to the arguments given 
above for rejected claims 12-17 and are similarly rejected. 



5. Claims 18-20,37-39 and 56-58 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Meverzon et al. (' Meverzon ' hereinafter) (Patent Number 6,547,829) 
in view of Cho et al. (' Cho ' hereinafter) ("Finding replicated web collections," by Cho et 
al., Proceedings of the ACM SIGMOD International Conference on Management of 
Data, pages 355-366, 2000) and further in view of Ruian et al. (' Rujan ' hereinafter) 
(Patent Number 6,976,207) and further in view of Wang et al. (' Wang ' hereinafter) 
("Web search services", Wang et al., University of Science and Technology, Hong 
Kong, Issue Date: 2002, Series/Report no.: Computer Science Technical Report, 
HKUST-CS02-26). 
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As per claim 1 8, Meyerzon teaches 

A method of detecting duplicate documents in a network crawling system, 
comprising: (see abstract and background) 

constructing a plurality of tables, each table corresponding to a segment of a 
document address space, storing information identifying documents having a same 
document identifier and each identified document having an associated document rank, 
wherein the plurality of tables comprise N+1 tables where N is an integer greater than 
one, wherein the N+1 tables comprise N tables, each generated during a respective 
phase of a set of N crawling phases, and a current table generated during a current one 
of the N crawling phases, wherein an oldest one of the N tables was generated during a 
previous instance of the current crawling phase; (column 4, lines 43-60) 

receiving a newly crawled document, such document characterized by a 
document identifier and a document rank; (column 2, lines 3-16) 

reading information stored in the N+1 tables to identify a set of documents 
sharing the document identifier of the newly crawled document, and ascertaining an 
original representative document for the identified set of documents; (column 4, lines 
43-60) 

updating the information stored in the current table in accordance with the 
document rankings of the identified set of documents and the newly crawled document; 
(column 4, line 43 through column 5, line 13) 

determining a representative document for the newly crawled document and the 
identified set of documents; (column 2, lines 32-40) 
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and upon completion of the current crawling phase, ... of the N tables, (column 5, 
lines 1-20) 

Meverzon does not explicitly indicate "indexing the representative document 
when said representative document is the newly crawled document ; repeating the 
receiving, reading, updating, determining and indexing operations with respect to a 
plurality of newly crawled documents, each of which shares a respective document 
identifier with a respective set of documents". 

However, Cho discloses "indexing the representative document when said 
representative document is the newly crawled document ; repeating the receiving, 
reading, updating, determining and indexing operations with respect to a plurality of 
newly crawled documents, each of which shares a respective document identifier with a 
respective set of documents, such that at least some of the newly crawled documents 
are determined to be representative documents and are indexed" (newly replicated 
collection, page 365, first column, second paragraph; one page displayed or represents 
collection of duplicate document, page 365, second column, first paragraph). 

It would have been obvious to one of ordinary skill in the art at the time the 
invention was made to combine Meverzon and Cho because using the steps of 
"indexing the representative document when said representative document is the newly 
crawled document ; repeating the receiving, reading, updating, determining and 
indexing operations with respect to a plurality of newly crawled documents, each of 
which shares a respective document identifier with a respective set of documents, such 
that at least some of the newly crawled documents are determined to be representative 
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documents and are indexed" would have given those skilled in the art the tools to 
improve the invention by allowing duplicate documents to be identified and represented. 
This gives the user the advantage of not having multiple copies of the same document 
to choose from. 

Neither Meyerzon nor Cho explicitly indicate "retiring the oldest one". 

However, Ruian discloses "retiring the oldest one" (column 15, lines 20-25). 

It would have been obvious to one of ordinary skill in the art to combine 
Meyerzon , Cho and Ruian because using the steps of "retiring the oldest one" would 
have given those skilled in the art the tools to create an effective information storage 
and retrieval system. This gives the user the advantage of keeping a limited amount of 
historic information. 

Neither Meyerzon , Cho nor Ruian explicitly indicate "such that at least some of 
the newly crawled documents are determined to be representative documents and are 
indexed". 

However, Wang discloses "such that at least some of the newly crawled 
documents are determined to be representative documents and are indexed" (update 
the index where pages have changed, page 9, first and second paragraphs). 

It would have been obvious to one of ordinary skill in the art at the time the 
invention was made to combine Meyerzon , Cho , Ruian and Wang because using the 
steps of "such that at least some of the newly crawled documents are determined to be 
representative documents and are indexed" would have given those skilled in the art the 
tools to improve the invention by allowing changes in the web to be accurately 
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implemented in search engines. This gives the user the advantage of having 
representative research results. 

As per claim 1 9, Meverzon teaches 

the reading comprises reading from a merged table that stores information from a 
plurality of the N tables, and reading from the current table (column 4, lines 43-60). 

As per claim 20, Meverzon teaches 

information identifying the identified set of documents, including a particular 
document serving as the original representative document of the identified set, is stored 
in one or more tables (column 9 lines 32-40). 

As per claims 37-39, 

These claims are rejected on grounds corresponding to the arguments given 
above for rejected claims 18-20 and are similarly rejected. 

As per claims 56-58, 

These claims are rejected on grounds corresponding to the arguments given 
above for rejected claims 18-20 and are similarly rejected. 
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6. Claim 49 is rejected under 35 U.S.C. 103(a) as being unpatentable over 
Meverzon et al. (' Meverzon ' hereinafter) (Patent Number 6,547,829) in view of Choet 
al. C Cho ' hereinafter) ("Finding replicated web collections," by Cho et al., Proceedings 
of the ACM SIGMOD International Conference on Management of Data, pages 355- 
366, 2000) and further in view of Wang et al. (' Wang ' hereinafter) ("Web search 
services", Wang et al., University of Science and Technology, Hong Kong, Issue Date: 
2002, Series/Report no.: Computer Science Technical Report, HKUST-CS02-26) and 
further in view of Lambert et al. (' Lambert ' hereinafter) (Patent Number 6,976,207). 

As per claims 49, 

Neither Meverzon , Cho nor Wang explicitly indicate "a document is a temporary 
redirect page comprising a document content, a source document at address, and a 
target document address". 

However, Lambert discloses "a document is a temporary redirect page 
comprising a document content, a source document at address, and a target document 
address" (paragraph [0057]). 

It would have been obvious to one of ordinary skill in the art to combine 
Meverzon , Cho , Wang and Lambert because using the steps of "a document is a 
temporary redirect page comprising a document content, a source document address, 
and a target document address" would have given those skilled in the art the tools to 
accurately represent web sites and the content that they hold. This gives the user the 
advantage of recognizing web page structure. 
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Response to Arguments 

7. Applicant's arguments filed 4/28/2008 have been fully considered but they are 
not persuasive. 

8. Applicant argues that neither Meverzon nor Cho disclose "indexing the 
representative document when the representative document is the newly crawled 
document; and repeating the receiving, reading, updating, determining and indexing 
operations with respect to a plurality of newly crawled documents, each of which shares 
a respective document identifier with a respective set of documents, such that at least 
some of the newly crawled documents are determined to be representative documents 
and are indexed". Applicant further argues that Meyerzon looks for a document that 
exists in a history table and does not index the document if it does exist, and therefore 
newly crawled documents are never used as the representative document. The 
Applicant similarly argues against the Cho reference teaching the same subject matter 
and has the same deficiencies as Meverzon . Respectfully, it is noted that the newly 
cited Wang reference discloses that a crawler can update an index and that pages may 
have changed (page 9, first and second paragraph), which solves the shortcomings 
described by the Applicant. Therefore the concept of a representative document is 
taught in Meverzon and Cho references minus rebuilding of an index, but Wang teaches 
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this limitation. Therefore the new rejections presented as combinations of Meyerzon , 
Cho and Wang (in claims 1 2-1 7, 40, 42-49 and 50-55), or Meyerzon , Cho, Rujan and 
Wang (in claims 18-20, 37-39 and 56-58), respectively disclose the limitation. 



Conclusion 

9. The prior art made of record, listed on form PTO-892, and not relied upon is 
considered pertinent to applicant's disclosure. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Jay A. Morrison whose telephone number is (571) 272- 
71 12. The examiner can normally be reached on M-F 8-4:30. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Tim Vo can be reached on (571 ) 272-3642. The fax phone number for the 
organization where this application or proceeding is assigned is 571-273-8300. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). 



July 7, 2008 



Jay Morrison Tim Vo 

TC2100 TC2100 



/Pierre M. Vital/ 

Supervisory Patent Examiner, Art Unit 2169 



